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I. INTRODUCTION 


A. BACKGROUND 

Adjusting the frequency content or pitch of a signal 
is a topic researched within the audio field. The hearing 
impaired community has the greatest interest in the 
applications of frequency modification or transposition 
techniques. This is due to their need for auditory speech- 
processing aids. 

Auditory speech-processing aids are divided into two 
groups: those which involve nonradical processing of the 
speech signal, with the speech still intelligible to a 
person with normal hearing, and those which involve radical 
re-coding of the speech signal (Ref. 1:рр. 537-5571]. 

An example of radical recoding involves ct systems as 
cochlear implants where the normal speech signal 13 
processed into a series of vibrations that the brain 
interprets as sound. Individuals who have this type of aid 
surgically inserted in their cochlear must learn a 


completely different language than a person with normal 


hearing. Examples of nonradical processing aids include the 
most widely used amplifier aids and the less familiar 
frequency lowering devices or frequency transposition 
systems. 


Most hearing aids amplify sound. Some aids may amplify 
or soften certain frequencies, while others transmit sound 
from the aid on one ear to the aid on the other ear. Their 
primary purpose, in either case, 1з to amplify everything 
they are capable of sensing. In this thesis, however, we 
are interested in developing an algorithm that may someday 
drive an aid which lowers the frequency content and 


preserves the intelligibility of the speech signal. 


B. FREQUENCY MODIFICATION 
Pickett (Ref. 2:рр. 191-194] categorizes two basic 
methods that have been used for frequency lowering: 
ДЕ кё ч ДСУ transposition, where a portion of tne 
Signal is separated out and resynthesized in a lower 


frequency band. 


2. Frequency division, where the frequency of the 
signal is reduced by a fixed ratio. 


All of the methods involve signal distortion. Signal 
distortion tends to increase with greater frequency shifts. 
Here we are concerned primarily with the idea of moderate 
frequency transposition, where the signal is shifted without 
major distortions in the information content. 

The earliest known suggestion of frequency lowering was 
by Perwitschky (1925). The earliest transposing hearing aid 
was built and tested by Johansson (1955). Since then, there 
have been several other systems built апа tested, but 


considering the advances and trends of current technology, 


10 


research in the area of frequency transposition of speech 
has not been productive. 

Frequency transposition systems have utilized analog 
techniques such as frequency modulation (shifting an upper 
band to a lower band); frequency division (a slow playback 
of a tape recorded signal); and digital techniques such as 
sampling distortion (omitting segments of recorded speech), 
and doppler (the delaying of the incoming signal). Though 
these methods have been developed and extensively tested, 
the digital approach presented here тау produce, а11 
together, different results. 

Pickett confirms that the possibilities oP usable 
frequency shifting algorithms have not been explored 
extensively enough to make recommendations for practice 
єг. 2:p. 193). The research needs in this area include 
obtaining new information on the potential for digital  re- 
coding, exploring the principles of transposition, finding 
which general cues can be sent in this way, finding the 
optimum parameters, and examining what system can be built 


that meets our general and specific needs. 


C. A NEW TECHNIQUE FOR FREQUENCY TRANSPOSITION 

Recently, Hall (Ref. 3:p. 3956] postulated that pole 
shifting in the z-domain using an auto-regressive (all pole) 
model of speech may be a possible option for frequency 


lowering. He used linear predictive coding (LPC) techniques 


Ter 


to process the speech to determine if pole shifting was a 
viable option. His experimental results were positive 
because he was able to create a change in pitch on the input 
Speech segment. 

This thesis is an extension of Hall’s research. It 
ventures beyond the frequency domain model, and works 
directly with the linear predictive time domain model. pue 
was postulated that a linear relationship exists between 
frequency content and the reflection coefficients determined 
using LPC. Once this theory has been postulated, a speech 
processing experiment was undertaken to determine if the 


a 
conjectures made were plausible. 


In this report linear prediction is introduced, the 
particular algorithms used to process the data are 
explained, and experimental research was carried out. 


Identical phrases of speech, spoken at different pitch 
levels by the same speaker, are sampled and processed. 
Possible patterns existing between the different pitch 
segments of speech and their linear predictive coefficients 
are analyzed. 

The results of this research indicate that there is no 
linear relationship that exists between the frequency 
content of speech and the LPC reflection coefficients, апа 
recommendations are made for continued analysis concerning 
linear predictive coding and the frequency transposition of 


speech. 


не 


II. MODELING SPEECH PRODUCTION 


A. INTRODUCTION 

In order to understand speech reproduction and 
synthesis, it ‘is useful to consider some of the basic 
elements that combine to produce speech. The most 


elementary model used to explain the production of speech 


is the human model illustrated below аз Figure 1. 
NASQL. TRACT 
VELUM 
VOCAL TRACT 
TONGUE 51-20 EPI&LOTTIS 
/ 
9 
| Ето ESOPHAGUS 
VOCAL LORUS "i 


LUNGS 


Human Speech Production System (Ref. 4:p. 42]. 


Figure 1. 


The lungs produce the air flow necessary to begin the 
generation of sound. The vocal cords, tongue, mouth, lips 
and nasal tract combine their different properties to shape 


the airflow to produce the speech waveform we hear. 


1 


B. THE SPEECH PRODUCTION MODEL 


Evans (Ref. Ч4:рр. 40-451 relates the several human 


functions to mechanical models. This is standard practice 
and a widely accepted approach to speech production 
modeling. He states that the lungs are the excitation 
source for the vocal and nasal tract areas. Ап excitation 


source can either be modeled as a pulse train generator or a 
random number generator when reproducing speech. 
In the case of voiced sounds (ie. consonants, vowels or 


nasal sounds), the air released by the lungs is periodically 


modulated by vibrations from the vocal cords, glottis, and 
velum. Thus the excitation model in this case is a pulse 
generator. In the case of unvoiced sounds (ie. sh, sss, 


fff) which require no vibrations to be produced, the modeled 
excitation source is a random number generator. 

T excitation sources produce a quasi-periodic wave 
form that we recognize as speech. That is, the period of 
the wave form varies with time depending on the sound being 
produced. This репост а most obvious in the production 
of voiced or vibrated sounds. Figure 2, a general discrete- 
time model of the human speech process, illustrates this 
point more clearly. Here we have represented the vocal 
tract model as a time-varying digital filter. 

Note that the pulse train has an input labeled pitch 


period. This input determines when the pulses will be 
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Discrete-time Model for Speech Production 


Figure 2. 


rS 


emitted from the pulse generator and at what  periodicity. 
This is only necessary for voiced speech. 

The unvoiced speech is a continuous stream of random 
numbers commonly. referred to as white noise. The flow of 
random numbers may produce a seemingly quasi-periodic sound, 
however, Since they are usually of such short duration, we 
consider the sound to be continuous and constant, and not 
periodic. 

Each speech waveform has a specific amount of energy. 
The energy contained within each utterance of a set duration 
will be referred to as gain (G). This is what gives speech 
its E or quality. It also aids reproduction DY 
indicating the intensity or inflection of the voice signal. 

Once the voiced or unvoiced decision 15 made апа ап 
energy ог gain is assigned, the scaled excitation function 
drives the vocal tract model. In a phone interview with 
James Kaiser of Bell Laboratories, he mentioned that 
current thinking in the area of speech reproduction has 
refocused its attention on this portion of the model апа 
that there is a movement to more clearly describe the 
physics behind the different physical contributors of 
speech. 

This vocal tract model is driven by the excitation and 
energy function and controlled by time varying vocal tract 
parameters. These vocal tract parameters adjust the vocal 


tract model to yield the desired output waveform. By 
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replacing the vocal tract model with an equivalent time- 
varying digital filter that models the vocal tract  model's 
response, we are able to step right into the next phase of 


synthetic speech reproduction. 


C. DIGITAL FILTER REPRESENTATION 

Although speech 1s modeled most efficiently by poles and 
zeros, it may also be modeled accurately - an  auto- 
regressive (all pole) filter if the order of the filter із 
large enough. For example, a tenth order  auto-regressive 
filter will accurately model most audible sounds. 
Therefore, the transfer function (Н(29) of the digital 


filter in Figure 4. is shown as Eq. 1-1. 


H(z) = ------------------ (2-1) 


1 - = ак>К 


кті 
where p is the order of the filter, G is the gain, and ay is 
the filter coefficient. 

С апа аң аге the time-varying vocal tract parameters for 
this filter. For a given segment of time (і.е., Ота ШІ - 
seconds) the vocal tract parameters are constant. However, 
stringing these segments together in rapid succession to 
produce a one second interval of speech, the parameters will 
Change 100 times. This is why they are referred to as time 


varying; they vary over a short period of time. 
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The type of digital filter used in Figure 2 is 
arbitrary. It is the concept behind the diagram that 
counts. For the purposes of this research, the properties 
and attributes of a time-varying lattice filter are best 
because they lend themselves well to linear predictive 


coding implementation. 
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III. LINEAR PREDICTION» THEORY 


таан саан ee ee es ee aD aD aD aD MD a n a ee ee 


A. WHY LINEAR PREDICTION? 

Although spectral analysis is a well-known technique for 
studying signals, its application to speech signals suffers 
from a number of serious limitations arising from the 
nonstationary as well as the quasiperiodic properties of the 
speech wave. By modeling the speech wave itself, rather 
than its spectrum, we avoid the problems inherent іп 
frequency-domain methods. 

For instance, traditional Fourier analysis methods 
require a relatively long speech segment to provide adequate 
spectral resolution. As a result, rapidly changing speech 
events cannot be accurately followed (Ref. З:рр. 276-294]. 

Linear predictive coding is applicable to a wide range 
of research problems including speech production and 
perception. One of the main objectives in any speech 
processing technique is the synthesis of speech which is 
indistinguishable from normal human speech. 

Atal noted that much can be learned about the 
information-carrying structure of speech by selectively 
altering the properties of the speech signal. He also 
stated that LPC techniques can serve as a tool for modifying 
the acoustic properties of the speech signal (Ref. Sip.276]. 


These are exactly the intentions of this thesis: to modify 
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the speech signal by investigating the properties of the 
information carrying structure. 

The remainder of this chapter is a summary of linear 
prediction theory. The major portion of this section 15 
extracted from Makhoul’s tutorial review on linear 
prediction (Кет. бірр. 124-1431, апа will be based on an 
intuitive approach, with emphasis on the clarity of ideas 


rather than mathematical rigor. 


B. LPC THEORY 

In applying time series analysis, each continuous signal 
s(t) is sampled to obtain a discrete-time signal s(nT), also 
known as a time series, where n is an integer variable and T 
is the sampling interval. The sampling frequency is then 
fs=1/T. Note that s(nT) will be represented as sx in this 
discussion. 

The signal Sn is considered to be the output of some 
System with some unknown input ug such that the following. 


relation holds: 


Р E 
Sn = - = акзп-к + G Z bilun (3-1) 
Kz) сто 


where ak, bj, and the gain G are the parameters of the 
hypothesized system. This equation says that the ’output’ 
sn is a linear combination of past outputs and present and 


past inputs. That is, the signal sug is predictable from 


2О 


linear combinations of past outputs and inputs. Hence the 


name linear prediction. 


C. PARAMETER ESTIMATION 
In the all-pole model, we assume that the signal sy is 
Given as a linear combination of its past values and some 


current input ил : 


? 
за = > = akSn-k + Gun (3-2) 
Ко! 
which yields the following frequency domain transfer 
function 
G 
IIC ee OG a a (3-3) 
P 
1 + 27 aykz-^k 


Given a particular signal Sn, the problem is to determine 
the predictor coefficients ајр} апа the gain С in some 


manner. 


1. Method_of_Least_Squares 
Here we assume that the input un is totally 
unknown, which is the case of speech analysis. Therefore, 
the signal sn can at best be approximately predicted from a 


linearly weighted summation of past samples. Let the 


^ 
approximation Of spn be Sn, where 
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° 
ВЕ = - = &jeSn-k (3-4) 
K< l 


Then the error between the actual value sag and the predicted 


value sn is given by 


P 
en = Sn - Зр = зр + = akSn-k (3-5) 
Ке! 
The ‘quantity еп is also known as the residual. In the 


method of least squares the parameters (ay) are obtained аз 
a result of the minimization of the expected value or mean 
of the error squared term, Ер = cC cero with respect to 
each of the parameters. Ep is the minimum mean square 


^ 
prediction error, averaged over all n, and is represented by 


P 
o 
Ep =€ (en2) = = Е - = ak 5n-k|? (3-6) 
ал к=) 
For any definition of the signal sn, a set of 
equations with a set of unknowns can be solved for the 
predictor coefficients which minimize Ep. 


There are two distinct methods for the estimation of 


these parameters, namely the autocorrelation method and the 
covariance method. Both methods are clearly described by 
Makhoul (Ref. 6:рр. 12651211: Since the  autocorrelation 


method is the preferred method, only that method will be 


Summarized here. 
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а. Autocorrelation Method 


Here we assume that the error Ep is minimized 


over an infinite duration. Since 


+ GO 
КСї1) = = $n Sn-«i (3-7) 
(:-« 
із the autocorrelation function of the signal зп, 
Equation 3-6 reduces to 
5 
Ер = R(O) + Z ak R(k) (3-8) 


кој 
where RCO) is the total energy of the input signal and R(k) 
is the autocorrelation matrix of the input signal (see 


Pagure 3). 


505) 5152 9253 ... бр-15р RQs 1 а уз 2.6. Ren» 
5152 5951 5152 e 5р-05р-] R1, 2 R0, 1 Ro ... Ro-2,p-1 
5253 5152 5051 + 5р-35р-0] __ | Ray3 Rhe 4,1 s.e R0-3,0-2 
Sp-1Sp 5р-ә5р-1 5р-38р-2 ... 5051 Rp-1,P Rp-2,p-1 -3,р-2 ... да: 


Autocorrelation Matrix 


Figure 3. 


It is a symmetric toeplitz matrix (a toeplitz 
matrix is one in which all the elements along the diagonal 
are equal). Since the signal Sn is Known over only a finite 


interval, one popular method to control the size of the 
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toeplitz matrix is to multiply the signal Sn by a window 
function Wn. This yields a slightly different signal зе“, 
which is zero outside the finite interval. 

In any case, the autocorrelation matrix is the 
means for solving several of the linear predictive 
coefficients needed to analyze and synthesize speech. The 
following chapter discusses, in greater depth, what those 


coefficients are and how they are obtained. 
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IV. LINEAR PREDICTION OF SPEECH 


A. INTRODUCTION 

As mentioned earlier, there are several ingredients or 
time-varying parameters that are needed to generate speech. 
When using linear predictive coding techniques, three 
ingredients are essential: gain or energy, pitch period, and 
the filter reflection coefficients or spectral envelope 
parameters. 

Figure 4 illustrates the fact that, depending on the 
specified frame length, these ingredients must change every 
lO to 20 ms. On a frame-by-frame basis the incomming signal 
is processed to obtain the gain, the pitch period and the 
reflection coefficients kl, k2,...,KN. 

The pitch period and the gain parameters are used to 
construct an excitation function for production of either 
voiced or unvoiced speech. This driving ör “excitation 
unction із input to a filter which is configured by the 
spectral envelope parameters determined from the analysis. 
The output is one frame of synthetic speech, and by 
stringing several frames of speech together, audible souncs 
are produced (Ref. 7:pp. 337 - 3451. 

Analysis of the speech signal is done by calculating the 
LPC model parameters for each 10 mS time frame. This 


chapter will discuss these essential parameters. 


22 


HII 
Q32/$3H.LNA^S 





Sw o? oL ot Ay2A2 
SYJ WAN ISINA 70 
43$ M3" V 5072/0 
HIBEISTHLNAS INL 


125У1 THIOA 


51М383Ч8<3ЭМ 
4711713 73.1154 





SLN INIZI) 
y 3L7/4 





(тиэ) 
4555/ҮЭ 








( 3205s asteN aim) 
50//105 032/0л/1/7 


— IW 


strane 


(ММ 357041 2401934) 
SANAS Ч459/0/ 








(022/04У0 
yo 032104) 
ANIDIOA 








525770 «ЕЕЗГЕШІТЕ; 
(01534 of 
H2LId 1435 SU38W/N 





2338]. 


p 


7 


(Ref. 


LPC Model of the Human Voice 


А 


Figure 4и 


26 


В. LPC ENCODING PARAMETERS 
1. Voiced / Unvoiced Decision Making 

Some sounds require the vibrations induced by the 
vocal cords, while others do not. Voiced sounds represent 
those that require an excitation from the vocal cords or 
lips. Unvoiced sounds are generated by a steady flow of air 
as їп the case of ’s’ ог ’f’. A decision must be made іп 
order to properly excite the digital Du MH produce the 
desired sounds. 

According to Atal (Ref. Sipe 280) the voiced/ 
unvoiced decision is based on the ratio of the mean-squared 
value of the speech samples to the mean-squared value of the 
prediction error samples. This ratio is considerably 


smaller for unvoiced speech sounds than for voiced speech 


Sounds. Typically, this ratio is a factor of 10. 
Voiced Decision:  Eíspu) > 10 Elen] 
Unvoiced Decision: E[SnJ < 10 Elfen) 


This decision will determine whether to excite the 
digital filter with an impulse function or white noise, each 


having a particular gain or energy. 


2. Gain Computation 


In explaining the least squares method or  iinear 


prediction we assumed that the input was unknown. 
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Equation 3-5 can be rewritten as 
Р 
Sn = -~ = ak Sn-k + en (4-1) 
21 


Comparing Equations 3-2 and 4-1 we see that the only input 


Signal un that will result in the signal sr as output із 


that where Gun = еп . That is, the input signal is 
proportional to the error signal. For any other input the 
output will be different than sy . Therefore the energy of 


the input signal must be equal to the energy of the output 
Signal Sp 
Since the filter H(z) is гай it is clear from the 
above that the total energy in the input signal бир must 
equal the total energy in the error signal, which is given 
Бу Ер. Again, Makhoul (Ref. 6:p. 128] is the primary source 
for this information and he provides additional mathematical 
background in determining the resultant gain equation 
p 
G2 = Ep = R(O) + & ақ КСК» (4-2) 
Кеј 
where G2 is the total energy in me input апа R(k) is. 
again, the autocorrelation matrix. 
The classification of a sound as voiced or unvoiced 
determines the input to the filter H(z). However if the 
input Gun is white noise or a series of impulses, the gain 


is calculated from the same equation. 
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3. Pitch_Period 

The period of time that elapses between each 
excitation pulse is referred to as the pitch period. Atal 
(Ref. 2S5 D. 279) describes two different methods for 
determining pitch period. His second method is summarized 
here since PE is based on the linear predictive 
representation of the speech wave. 

In this method, except for a sample at the beginning 
of each pitch period, every sample of the voiced speech 
waveform can be predicted from the past values . The method 
of determining pitch period is relatively simple. 


Once the prediction error of the speech signal is 


determined through linear predictive processing, the largest 


or peak values are noted, (Figure 3). These points 
determine the times that excitation pulses should be 
initiated from the excitation source. This simple peak- 


picking procedure was found to be effective in determining 


pitch period as developed in Reference 7. 
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4. Reflection_Coefficients 

Earlier it was mentioned that the reflection 
coefficients determined using LPC are directly related to 
the polynomial coefficients of an all pole model. This 
section will show the relationship between them and 
illustrate how the reflection coefficients are determined. 

Recall that we are looking for an estimated output 
which is Rene weighted sum of paat system outputs (see 


Eqns. 3-4 and 3-5). The autoregressive (AR) model in 


Figure 6 illustrates this process. 


ч (к) ! Жез) " 4(к-2) Пен) 


% 7 


Q; 


^ 


(К) ек) 


d: 


Autoregressive Model 


Figure 6. 


The goal of LPC is to adjust the ay's to minimize 


EE Achieving it involves solution of a linear system or 
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equations, using Levinson’s algorithm, and leads to the 
lattice structure AR model we are most interested in (see 
Figure 7). The mathematical development for this may be 


found in Parker (Ref. 9:pp. 110-1121. 





Lattice Structure Analysis Modei 


Figure 7. 


Lattice structuring requires the determination of 
reflection coefficients, hereafter referred to as K. The 
Ка of an n-th order Lattice filter transfer function аге 


related to the polynomial coefficients of an nth order AR 


filter transfer function through the following matrix 
equation: 
(N+1) гэ. (N+1) CN) 
a = a + K -a (4-3) 
О 1 
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where 


(N+1) ~” TON) (N) 
(N+1) Ryy = ryy . ak 
K но... (4-4) 
T (ND (N) 
Ryy (O) Б ај .) гуу 


The matrix гуу is the last column of the Ryy 
autocorrelation matrix mentioned earlier. The notation has 
been slightly altered from Parker” s presentation (Ref. Srp. 
112) to be consistent with the preceding chapters of this 
development. 

Equations 4-3 and 4-4 have been included in this 
presentation to show how the polynomial coefficients (ak s) 
are related tho the reflection coefficients (K-37. 
However, there is an easier and more direct method towards 
determining K's. А brief development is presented here. 

Working іп the Z-domain, we know that the transfer 


function of the AR model is 


их 


(2) = j- ай, 2 SE M 22 (4-5) 


с 


я и 
5.1 


апа 


Ж“ (М) -N Е Р | ) 
A (2) = 2 Асу Ё 20 E 4,232 - 2 - Ap ви) (4-6) 


ЗЭ 


where A (z) ia A(z) in reverse order. 


Combining and reforming in matrix form, yields 


А (е): :1- [га ps ск А ОД таа мој гэн (4-7) 


or more simply 


Д А АІ, (4-8) 
and 

à (9) 2; 4 м) (Nt!) 

АЧ за e) RE ar) 


Writing Equation 3-5 in the Z-domain yields 


N N 
ECT) = A(z) SC) (4-10) 


Combining 4-10 with 4-8 and 4-9 and returning to the time 


domain, yields the following error equations. 


(МН), y a (n-i) 
€ E e" (k)4 к о á (к-!) (4-11) 
and 
“ (М) 20 zs 
С (Kk): e ° Эф Юн 1K) 22128 
(N+1) “ (N) 


where e (k) is the forward difference error, and e (K) is 


the backwards difference error. Equations 4-11 and 4-12 
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correspond to the lattice implementation in Figure 7. They 
have been used to determine the K’s of a 12th order model in 
the sine wave and speech experiments which follow. 

The order of the filter is simply determined by 
assigning  N. For speech, anywhere from a 6th to a 12th 
order model has been found to be sufficient. 

The reflection coefficients are determined every 10 to 
20 milli-seconds and when lined up side by aide appear to 


present a spectral envelope, (Figure 8). 


INPUT SPEECH 





Ілеге аге eleven бо. ys ad fourg ire! s 
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Display of Analysis/Synthesis Parameters timer. 10:p. 163. 


Figure 8. 


Determining the reflection coefficients, in any case, 


is а straight forward calculation which is an attractive 


ЭЭ 


feature of LPC. It is the pattern these K's may produce in 


our experiment that we will be most interested in. 


5. Spectral Anaiysis 

A convenient way to portray the frequency content of 
Speech 1з through the determination of formant frequencies. 
Formant frequencies are the most prominent frequencies 
present in a speech waveform. 

Formant frequencies are not required to produce LPC 
synthesized speech. In other words, given the voiced 
decision, gain, piten period, and the reflection 
coefficients, one has enough information to reconstruct the 


Speech wave form. However, the determination of the formant 


frequencies aids us in depicting a frequency transposition. 


a. Formant Frequencies 

The complex roots of the denominator polynomial 
are the complex formants (bandwidths and frequencies) used 
to approximate the speech signal. The coefficients, a, ,orf 
the denominator polynomial are obtained from time-domain 
calculations on samples of a short segment of the speech 
waveform; namely {Sn} = (51 , 52 ,...sNJ), where N»»p. Here 
N is the number of samples, and p is the order of the 
polynomial (Ref. llipp. 364-366). 

Under the assumption that the waveform 


samples, Sn , are samples of a random gaussian process, the 


entire speech sample is broken up into an equal number of 
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samples which we will refer to as segments, (Бест 9). 
Each segment is processed using the Fast Fourier Transform 


(FFT) and then low pass filtered if desired. 


IS™SPEECH SEG, 
20 SPEECH SEG. 
3 SPEECH SEC EE m | PF 
th 
n SPEEL SEG. 1 
FoR IO MS. SEGMENT 





FeEQVENCY CONTENT 


Flow Chart for Obtaining the Spectral Content 
of One Complete Utterance 


Figure 9. 


The output of each segment contains the spectral 
content of that segment. Each segment is sequenced together 
to yield a time-varying frequency content prorile of the 
entire utterence with each segment containing its particular 
frequency content. The formant frequencies are the most 
prevalent, ог реак, frequencies found in the speech wave 


form « 
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C. SPEECH SYNTHESIS 

A speech signal is synthesized by using the same 
parameters determined with LPC analysis. A block diagram of 
a speech synthesizer was shown in Figure 4. The control 
parameters supplied to the synthesizer are the pitch period, 
a binary voiced or unvoiced parameter, the rms value of the 
speech samples or gain, and the predictor or reflection 
coefficients. 

The pulse generator produces a pulse of unit amplitude 
at the beginning of each pitch period. The white noise 
generator produces uncorrelated uniformly distributed random 
samples with standard deviation equal to 1 at each sampling 
instant. The selection between the pulse generator and the 
white noise generator is made by the voiced-unvoiced switch. 
The synthesizer control parameters are reset to their new 
values at the beginning of every pitch period for voiced 
speech and once every 10 msec for unvoiced speech. 

The amplitude of the excitation signal is adjusted by 
the amplifier G. The linearly predicted value s, of the 


speech signal is combined with the excitation signal uy, to 


form the n-th sample of the synthesized speech signal. The 
Signal is finally low-pass filtered to provide the 
continuous speech wave {Sn}. Atal (Ref. Sip. 280] provides 


the mathematical development needed to synthesize these 
parameters. À mathematical discussion will not be pursued 


further here. 


38 


V. DIGITAL FREQUENCY TRANSPOSITION 


«нә» ак «ние ено ано «нә» «ни» оңын “ma Qam ако «ны» «ни» ma < еко د‎ e ано  f]`°, O coe сш oe cee ee eo == 


A. INTRODUCTION 

The object of this research was to determine an 
algorithm that will digitally transpose speech using linear 
predictive coding. In this chapter, Hall^s research (Ref. 
3) will be briefly discussed and summarized. A new theory 
will then be postulated and a simple experiment using pure 
Sine waves will be presented to test the credibility of the 
theory. Keep in mind that the real test will be the actual 
processing of speech, this section simply sets the scene for 


further study. 


B. ‘POLE SHIFTING IN THE Z-PLANE 

Only the highlights and summary of Hall’s thesis will be 
presented here. His goal was to change the pole locations 
before reconstruction ‘(of the sampled speech signal) ЕО 
produce the output voice with different pitch апа format 
frequencies while retaining a natural sound and the same 
information (Ref. 3 3:p 471. 

The autoregressive vocal tract transfer function used in 


his research is represented by Equation 5-1. 
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Н(22 с ------------------------------------------- (5-1) 
- 2TI( BW) T; -1 -4T(BW)T -2 
1 - 2e соз(21ҒТ; 22 + е 2 


where F is the center frequency of the formant, and BW is 


the bandwidth of the formant. The pole locations associated 


with this transfer function are: 


Converting Equation 5-1 into polar form produces Equation 


= =Z 


H(Z) = ооо E CEqn I S= 2 


Through several mathematical manipulations and solving 


for А апа Ө, the following relationships for F and BW are 


determined: 
F-0/7 2HnT (5-3) 
BW = (-in A»5/2ii T (5-4) 
-2 1 (BW) T 
where А = e and Ө = 2ПЕТ 


Assuming that a linear relationship exists between F 
(the original frequency) and F’ (the modified frequency), 


` several general expressions are stated to illustrate the 
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underlying modification to the pole ‘locations. Note that 


the following equations are all linear relationships. 


F’ = ЌЕ (5-5) 
BW’ = “BW (5-6) 
a’ = Ye (5-7) 
ца. (5-8) 


The most ое consideration for producing these 
relationships is guaranteeing that no unstable poles will be 
created by shifting them outside the unit circle. For more 
of the specifics on Hall^s development see Reference 3, 
pages 49 and 50. 

Two experiments are illustrated in Halls thesis. They 
аге: 


1. Pitch was reduced Бу а factor of .58 апа the 


formant frequencies reduced by -88 for voiced 
speech. 
2. The same modification was done for a seqment of 


unvoiced speech. 

Hall concluded that upon completion of the process most 
listeners agreed that, although the input speech was female, 
the modified output speech sounded typically male. It was 
also noted that although the audio output was somewhat 
lacking in quality, it was intelligible (Ref 3:p. 73]. The 
tapes which recorded that audio output are no longer 


available for subjective evaluation. 
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Linear predictive coding is a means to an end for Hall. 
He modifies the the variables mentioned (F,BW,0,A), and 
processes the speech with LPC computer programs. This 
conversion between an autoregressive vocal track model and a 


LPC model (implemented most easily by a lattice filter 


configuration) is possible through Equations (4-3) апа 
(4-4). 
The mathematics are simple. What is most important here 


is that the relationship between the two different 
representations of speech, the AR model and the LPC model, 
are closely associated with one another. To calculate one, 


in a sense, is to calculate the other. 


C. A NEW PROPOSITION 
1. Statement of Theory 

As mentioned earlier, LPC techniques can serve as a 
tool for modifying the EE properties of the speech 
signal. This thesis postulates that a linear relationship 
exists between the reflection coefficients, which determine 
the spectral envelope of the speech wave form, and the 
frequency content of that wave form. If this relationship 
exists and the linear relationship is determined, then by 
selectively modifying the reflection coefficients, the 
frequency content will also be modified. 


Is there a linear relationship between the 


reflection coefficients (K’s) and frequency content? The 
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first step in our proof is to analyze the most simplified 
case. Since speech is often represented as a combination of 
many different frequencies, the simplest case would be to 
analyze a fixed frequency sine wave. If the results turn 
out to be negative, then exploring the more complex case 


(speech) would probably be futile. 


2. Sine Wave Experiment 

At апу given frequency a pure sine wave may be 
considered a continuous energy and amplitude signal which 
will generate an audible pitch when it is within the 200 Hz 
to 15 kHz audible range. When dealing with normal speech 
wave forms, the audible pitch range is somewhere between 
200 Hz and 5 kHz. 

À computer program was written in Fortran (Apo. AS. 
for use on the IBM 3033 to produce a sine wave for further 
analysis. The resultant sine wave could be sampled at any 


desired rate and the frequency of the wave could be 


incremented to satisfy the range requirements of 200 Н> - 


> kHz. 

Once the sampling rate was determined and the sine 
wave frequency set, the reflection coefficients were 
calculated for a 10ms time frame, stored in a escis fiie 


and plotted to determine if a relationship exsists between 


frequency and the nth order K's. 
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To determine 12 reflection coefficients (K’s) for 
each frequency, Equations 4-11 and 4-12 were used. 
Additional runs were also made to determine the affect of 


noise on the outcome. The results were promising. 


3. Sine Wave Experimental Results 

Appendixes B and C illustrate the apparent linear 
relationship that exsists between frequency and the LPC nth 
order K’s in a noiseless environment. Appendixes D and Е 
illustrate that same relationship ina noise environment 
Ә.М = 10:1), 

It would appear that a linear relationship does 
exist between the different frequencies of a sine wave. 
Noise on the other hand changes that linear relationship. 
Noise addition seems to affect K7 through Ki2 much more 
than K1 through Ko. 

Considering the mathematics involved in calculating 
K, these observations are reasonable. Since the later K's 
are affected most by small changes in the input signal, 
addition of noise will affect them more drastically than the 
earlier stages . 

Though these observations are promising, they are by 
no means conclusive. If no MU M between the К“5 апа 
frequency existed, another scheme would have had to be 
considered. Nevertheless, speech is the more complicated 


signal that we consider in the next two sections. 
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VI. SPEECH PROCESSING EXPERIMENT 


— Q“ w G Q q Q Q Q s EN q s w q Rod 


A. INTRODUCTION 

Now that the fundamentals of linear predictive coding 
have been presented and a theory of frequency transposition 
proposed, it is necessary to work directly with speech 
itself. To obtain the information we are seeking, the 
correlation between reflection coefficients and frequency 
content, speech samples must be demonstrated. 

Documentation concerning the data acquisition system 
used in this research to obtain speech samples is provided 
as Appendix F. This chapter discusses the data itself, апа 


the processing of it. 


В. VOICED/UNVOICED PHRASES 
Three phrases were chosen for their voiced and unvoiced 
characteristics as described in Chapters 2 and 4. They are: 
{IIT READY. 
2) “SO WHAT” 
3) SNEEZE 
Each phrase was repeated at a different pitch ana to 
make things simple, the musical scale was picked to heip 
harmonize a change in pitch with some type of reference. in 


other words, "READY" was first spoken in the middle-C ranae, 
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and then in the D range, until it was finally spoken in the 
high-C range. 

This procedure yielded eight different pitches for each 
of the three phrases. One male speaker provided the data 
for all three phrases. Additionally the period remained 
constant for each pitch and their individual utterances. 
For a graphical representation of the selected speech 
utterances, refer to Appendices G, H, and I. 

Each phrase was chosen for content and can be classified 
as voiced, unvoiced, ог а combination of both. "READY" is 
strictly a voiced word, whereas "SO UHAT" and '"SNEEZE" are a 
combination of voiced and unvoiced segments. The S,WH, and 
T sounds in "50 НАТ" will be our unvoiced example, апа 
“SNEEZE will be the combined example as the data is 


a 


analyzed. 


Gr DATA PROCESSING 
This section discusses the techniques utilized to 


analyze the data and the observations made. 


The raw speech data was edited and displayed using a 


generic display program. The data is 8 bit information with 
a maximum range of 256 equally spaced values. The 
resolution of each utterance varied with the pitch. The 


lower frequencies tended to have less gain or energy апа 
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therefore did not use all the 256 range values available. A 
summary of the ranges is provided in Appendix J. 

The periods of each phrase were different. The 
differences between the same utterance at different pitches 
varied as much as 20 msec. A short summary of the average 


periods are given in Table 1. 


TABLE I- 


UTTERANCE PERIOD NO. SEGMENTS NO DATAWPIS  /SEG 


“XXX” sec. N (1O msec SEG) 
"READY"' . 30 30 100 
S0 E E EMNE EB. z == 
SNEEZE. . 38 38 100 


The sampling rate was 10 kHz for all of the 
utterances, so the number of data points in each 10 msec 


segment is 100. 


2. Determining Reflection Coefficients 
Once the starting point is determined for each 
utterance, the reflection coefficients are calculated for 10 


msec segments of speech (App. К]. Successive seqments are 


analyzed to yield their respective reflection coefficients 
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using Equations 4-11 апа 4-12, as were the sine wave 
calculations. 

Reflection coefficients K1 through K6 were plotted 
for each of the 24 utterances and several of the resultant 


curves are included as Appendix L. 


a. Trend Anaiysis 
A graphical trend analysis of the plotted data 
was undertaken to detect any obvious patterns. The details 
of that analysis is included as Appendix M. However, a 
Summary of those observations leads us to the conclusion 


that there were not any trends of any Significance noted as 


a function of pitch. 


b. Graphical Correlation 

One graph was held stationary as a reference and 
the others were passed over it to see if there was any 
obvious match ups. There is nothing more elaborate to 
report than that no correlation was noted between them. 
Even though at times there were 2 or 3 points which matched 
ED the other 28, 36, or 38 points did not. Also there 
seemed to be no distinction between voiced and unvoiced 
portions of the speech wave. This process leads to the 


conclusion that. the various speech segments are highly 


uncorrelated. 
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З. Spectral Analysis of Reflection Coefficient Patterns 

It was noted during the trend analysis that the 
temporal patterns presented by the reflection coefficients 
seemed periodic. At first it was believed that this could 
possibly reflect the pseudo-periodic nature of speech or the 
excitation source. 

Spectral analysis was implemented using a Fortran 
subroutine to compute the FFT of each pattern. The program 
is included as Appendix N and several examples of the 
results are provided as Appendix О. 

In summary all of the spectra turned out to Бе 


relatively flat. This indicates that there are no prominent 


frequencies within the reflection coefficient sequences. 


4. Spectral’ Analysis for Frequency Content 

Spectral analysis to determine the frequency content 
of each utterance, as described in Chapter 4, would have 
been useful had a pattern or linear relationship shown up in 
the observations mentioned. 

Since there are no patterns or correlations worth 
mentioning, exploring the specific frequency content of each 
utterance would not benefit us. The relative difterence 
between each frequency, or ЛХ f is approximately 32 Hz. 

The range of the utterances was chosen to coincide 


with the musical scale from middle-C to high-C (a 256 Hz 


difference). Had a relationship been discovered, as 
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proposed, then a more in-depth spectral analysis of the 


input speech would have been in order. 


D. SUMMARY OF EXPERIMENTAL RESULTS 
1. Correlation Between Phrases Mith Different Pitches 
The linear relationship postulated in Chapter 5 
should have yielded more obvious results if relationships 
did exist between identical phrases spoken at different 


pitches. Three of the four categories mentioned above 


yielded negative or uncorrelated results. 


2. Voiced/Unvoiced Observations 
Though there may be other or more sophisticated 
techniques available to analyze this data, the methods 
mentioned above were sufficient to show that a voiced phrase 
was no more correlated than an unvoiced phase. 
Since the results were consistently negative ог 
uncorrelated leads us to some conclusions about the actual 


relationship between frequency content апа reflection 


coefficients. 
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VII. CONCLUSIONS AND RECOMMENDATIONS 


me me ao ae a a a олын» олын» олын» олын» «ыы: оын» “w r s سے‎ a I «лшы» 


A. CONCLUSIONS 

A new theory to transpose frequency was postulated and 
tested. Initial results, using sine waves, seemed positive 
and lead to a further study using speech  waveforms. The 
preceding experiment and subsequent analysis of speech 
showed no apparent correlation between pitch and reflection 
coefficient values. These results may be attributed to the 


following reasons. 


1. Complexity of Speech 

The speech wave form is a very complex combination 
Of gain, excitation, and spectral content. To pick out one 
particular attribute and analyze it for a particular 
phenomenon, such as frequency content, may be unrealistic. 

Speech has historically been modeled as a 
combination of sine waves. However, slow progress in the 
field of speech processing has caused engineers to rethink 
this point in terms of the physics involved in generating 


speech. This leads to our next conclusion. 


2. Physical/Mathematical Relationship 


The experimental results indicate that, irn this 


case, there is no obvious relationship between the physics 


ou 


(pitch) of speech and the LPC mathematical representation of 
speech (reflection coefficients). 

This observation makes sense since reflection 
coefficient determination is based on probabilistic methods, 
error feedback, and random input samples, the resultant 
output of each lattice stage no longer resembles the 
original signal. Once the error signal passes through the 
first stage of the lattice network, its characteristics have 
been altered as much as 10 percent. Reflection coefficients 
are therefore a tool for determining predicted error 
calculations based оп past inputs, and not a physical 
interpretation of the signals content. 

Just as engineers are in error when they refer to the 
pattern that successive reflection coefficients present as 
its spectral envelope, reflection coefficients do not 


directly reflect the frequency content of the signal. 


з. Periodic/Pseudo-Periodic Differences 
Simulation and experimental results show that 
reflection coefficients work differently with perdes 
signals (sine wave) than with  pseudo-periodic signals 
(speech). 
in calculating the reflection coefficients for a 
sine wave, the samples of one frequency are changed very 


slightly from the previous frequency’s samples. Therefore 


the calculated reflection coefficients also change very 
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slightly. This observation may be useful in the design of 
an LPC musical synthesizer, where frequency content and 
adjustment is processed in a controlled environment. 

On the other hand, speech behavior is more random 
than music. It is pseudo-periodic in the sense that 
complex vibrations are necessary to produce the speech 
waveform. However, the rate and randomness at which those 
vibrations change frequencies seems to prevent the 
reflection coefficients from having any kind of linear 
relationship with frequency content. 

It is therefore the conclusion of this research that 
the relationship between frequency content of speech and 
reflection coefficients is sufficiently complex ак 
modifying reflection coefficients in order to transpose 


pitch will not be practical. 


B. RECOMMENDATIONS 

The conclusions have stated that there is no iinear 
ЭМ сонор present between frequency content апа 
reflection coefficients. Recall that the motivation benine 
this research was based on Hall’s research (Re ` S 


concerning pole shifting. Therefcre the following actions 


= 
(г 


are recommended if further or more extensive study 

desired. 
1. Continue Hall’s research using LPC as a tool for 
speech analysis/synthesis, but focusing attention on 


the shifting of poles and not on the adjustment of 
reflection coefficients. 


ЭЗ 


2. Use a data acquisition system that yields 12 or 16 
bit resolution of the speech samples. 


з. Build a larger data base containing speech 
utterences ағ different pitch levels and have the 
speakers be both male and female. 


4. Have the ability to match articulation patterns and 
synchronize points where speech utterences begin and 
end. 

S. Synthesize the input and processed speech to check 


for intelligibility of the utterences. 
6. Use more sophisticated techniques for pattern 
recognition. 
It is believed that the preceding recommendations, 220 
followed, will help substantiate or refute Hall’s research 
as well as the findings of this research. The need for an 


adequate technique for frequency transposition still exists. 
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APPENDIX F - DATA ACQUISITION SYSTEM 


A. INTRODUCTION 

There are a vast number of data acquisition systems on 
the market today. Though this is the case, the system 
originally planned for the acquisition of this data, oroke 
down with no hope of timely repair. When all possible 
alternatives had been explored, it was decided that the oniy 
way to accomplish this portion of the research was to build 
a system capable of obtaining speech data samples. 

This section will discuss the system, hardware, and 
software utilities that were combined to produce the desirec 
data samples. In an effort to provide the novice, as wel. 
as the expert, with the information needed to retrace these 
steps, anything worth documenting, is. Additionally, a 
bibliography is provided in the main Bibliography of this 


thesis. 


B. EQUIPMENT REQUIREMENTS AND SETUP 


Figure 10 shows the experiment. Selected speech 
utterances were recorded on a 4-channel, 8-track tape 
recorder and stored for later use. The analog to digital 


өгч 
t 


(A/D) circuit was built and driving software written. This 


circuit was interfaced with the Zenith-100 microcomputer 
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through the Prolog 7804-Z80A Processor  Counter/Timer Card 


and the 8255 Parallel Peripheral Interface (PPI) microchip. 
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Data Acquisition 3-Dimensional Flow Chart 


Figure 10. 


Once the data was captured in the Prolog's 32K buffer, 
it was uploaded to the Zenith-100, via ZMDS software, and 
stored in Intel-Hex data files. The files were transferred 
from the zenith formatted disk, via an Osborne 
microcomputer, and placed on Kaypro formatted disks. 

A Kaypro 10 microcomputer converted the hexadecimal data 
into decimal data using Microsoft Basic (MBASIC) software. 
Edited versions of these files were then transferred to the 


IBM-3033 main frame computer for data processing. 


62 


C. ANALOG TO DIGITAL CIRCUIT 

The chip that provides the analog to digital conversion 
із the AD-570. It provides 8-bit information at sampling 
rates up to 33K samples/second. For our purposes, the 
Sampling rate was set at 10K since the majority of the 
frequency content is below 5 kHz. 

The circuit diagram CApp. pr illustrates tne 
interfacing between the 8255 PPI chip and the Host computer. 
The 8255 coordinates all of the necessary handshaking in 
driving the AD-570 chip. 

It was necessary to amplify the signal prior to entering 
the  AD-570, to obtain full use of the 256 amplitudes 
available. It was also necessary to provide an adjustable 
DC-offset to assure a unipolar input (i.e. the middle value 
had to be adjusted to be level 128 instead of level O). 

Also, the signal was filtered prior to data acquisition, 
through the use of a Butterworth filter designed with a 
frequency cutoff of 5 kHz. This helps smooth the data. 
However, during the processing of the data it may be 
necessary to filter it again. These additional circuits are 


also provided as Appendix F.2. 


D. MICROCOMPUTER INTERFACE 
The flow chart, provided as Appendix F.3, illustrates 
the Z-80 assembly language program, Appendix F.4, that was 


needed to drive the A/D circuit and collect the speech data. 
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The program, A2D.ASM, was also useful in testing, step Бу 
step, the proper operation of the circuit. 

The 2-80А micro-processor is at the heart of the system 
and the software designed to drive it is assembled using 
the Macro Assembler (M80) and linked to the Prolog station 
using Link software (L80). For more information on these 


procedures refer to the Bibliography. 


The sampling rate is not arbitrary. It is a 
function of the software. In assembly language programming 
each step that the microprocessor goes through takes a 
specific amount of time. We will refer to a measure of time 
as aT state. Each T-state equals the inverse of the clock 
rate interfaced with the Z-80 chip. Since we are using RI 
MHz clock, one T-state equals 250 nano-seconds. 

Every command line in the assembly program, 
including the command ’No Operation’ ог NOP, requires 
several T-states to accomplish its task. We are interested 
in the interval of time it takes from one sample to the 
next, and then we modify the software accordingly. 

This program has a delay loop in it (labeled DELAY) 
to slow down the data acquisition to 10K samples/second. If 
it did not have the delay loop in it, it would easily sample 


at 23K samples/second. Since each utterance was limited to 


less than one second, 1ОК затр1ев 1з workable and does not 


present prohibitive record lengths. 


E. DATA FILE SETUP AND MANIPULATION 

Once the data is collected and stored in the Prolog’s 
32K buffer, it is uploaded onto a Zenith 100 formatted 
floppy disk and stored in an appropriately titled HEX file. 
A sample of a typical segment of data ís provided as 
Figure 11. 


ать sam qm 


2165Д%000Е07%Е07Г?ТЕТЕ?0777675Т7ЕТЕТІ?ІЕСЕ2:е 
Il ЕВРО Е ЕБЕ РЕЕСТРЕ О СТЕТТОЕЕЯ 
21053000020816020Е6160617%Е072507Г7Г7С7ГТЕА2 
210231 0 0СТЕТСТЕТЕТЕТ 2 080607 Ү207У 07171716) 
210552200737?Е7Г?7С7?7?7277т?ЕёС807:808т68164А677А 
ІЛ БЕФУИЗСЕСЕСЕЗТЕТСІРТІС  Е57СТЕЕСЕЗЕ120654 
:105В4000ай0012252Е0?7%Е?7ЕТСТСТІТСТрФЕТЕСдЕОЕГ 
Hexadecimal Data File Segment 


Figure 11. 


The file is in Intel-Hex format. песо: оп starts “ort 
each line. The following ’10’% tells us that the lane ıs 
full of data. The next four digits indicate the memory 
location in the buffer. Every two bits following the memory 
location represents a byte of information. 

Following a double O, there are 16 records of data, and 
then a checksum byte at the very end. For our purposes the 
first nine digits and the last two digits are of по изе. 
The Intel-Hex file is already in ASCII format. 

An Osborne microcomputer was used to transfer data ‘from 


the Zenith 100 formatted floppy aisk to а Каурго formatted 
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floppy disk. Since the data is needed in integer form to do 
the necessary processing, a program was written [App. Ғ.51, 
in Microsoft Basic Language (MBASIC), to convert the data 
files from hexadecimal into the equivalent integer values. 
Finally, the data is ready for processing. Since the 
software was already written on the IBM-3033 to process and 


display the data, it was sent there via a 1200 baud modem, 


and processed. 
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APPENDIX_F.S-LINTELHEX TO DECIMAL DATARFGONSUEFRSTONEESOS И 


This program is designed to read a data file that is in 


Intelhex format and convert it to an integer file. 


2 PRINT "INPUT FILE ":INPUT FIS 

4 PRINT "OUTPUT FILE ":INPUT ЕОФ 

20 UFEN "0",2,ҒОФ 

30 ОРЕМ "І",1,Ғ1% 

4а INPUT #1, INS 

60 1F MIDS$(INS$,2,1)-2"à0" THEN CLUSE:GOTO 144 

70 FOR I=14 TO ай 5ТЕР 2 

ва HXS=MIDS(INS, 1, <) 

Эд VASVAL (" BH" +HIXD) 

gs 1F 1заа [HEN PRINT szc,USING" ness; V* ELSE PRIN] #2, USING" #88" 5V%; 
38 IF 1=42 THEN PRINT USING "###"3V% ELSE PRINT USING "sts"; V; 
таа NEXT I 

125 PRINT 

134 GOTU чи 

144 FRING "DONE" + CHRS(/) 
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APPENDIX G = READY E 





This is an example of the sampled utterence ”’ Ready’. 
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APPENDIX H - 50 WHAT NECEG 
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This is an example of the sampiea utterence ’So What’. 
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This 


APPENDIX. L - SNEEZE-F 


ls an example of the utterence 
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’ Sneeze’. 


ТІНЕ ІН SEC, 


APPENDIX J - SPEECH RESOLUTION SUMMARY 


This table lists the actual ranges used by each 


utterence out of a possible 256 levels (from O to 255). 


UTTEKENGE SCALE REFERENCE RANGE 
READY MIDDLE=G GO 220 
D Sa 0 
Е Во 255 
F Окт 223 
G = 233 
А 10 = 220 
B 23 = 230 
HIGH C Оше 2 Ко 
SO WHAT MIDDLE ZG SOLS 
D iO - 220 
E 12852225 
Е 5 lO 23119 
G O — 299 
А 8 = 207 
B OQ. 299 
HIGH-C О - 255 
SNEEZE MIDDLE >C 65 =- 180 
D 48 = 255 
E oue 2993 
E qo - 235 
G 45 - 210 
A 352 = Z220 
B 30 = 230 
HIGH-C се 3225 
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It yields a new set of K’s every 10 ms. 
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ПКЕЕБЕНЧОТХ = REFLECTION COEEEICIEM PATTERNMO_ _EOR 


maan Qm Q s ew r [U D w— “a Q a < “ w < h ñ  “ s “ F  “ w 


w s s s G“ Q x q Q aa s s s энэ 


TIME X 10 MSEC 





du 6-0 9-0 c'o 070 co- so- 60- zi- 
SQNLINDSVWN "414302 ХОЦОЧЛТАЯЧЧЯ 


Figure L.1. Sneeze-E Pattern of Rerlection Coefficient Kl. 
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Figure L.2. Sneeze-E Pattern of Reflection Coefficient K2. 
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Figure L.3.  Sneeze-E Pattern of Reflection Coefficient K3. 
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Figure L.4.  Sneeze-E Pattern of Reflection Coefficient K4. 
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Figure L.6.  Sneeze-E Pattern of Reflection Coefficient Ke. 
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APPENDIX M - SBNENDSANAEYSIS RESULTS 
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The following lists аге the observations made on the 


reflection coefficient curves for each utterence. 


"READY' 
K1 - All pitches have relatively flat curves. Tne 
magnitudes vary slightly between +.8 and +1.0. The higher 
the pitch, the more defined the troughs are. 
K2 ~ These curves all had the unique feature of sloping 
upward. They generally ranged from -.4 to +.9. No other 


correlation was noted. 


X3 - A negative sloping tendency characterized this set о: 
curves. 
K4 - Each of these curves had a plateau. Ready B, however 


did not fit in with this set at all. 


KS - These curves seemed to stay within a similar range, 23 


to -.7. Aiso several prominent peaks were uncorreiated. 
K6 - No correlations were noted, however ready-B was 


drastically different. 


BONEEZEDS 


Ki - Relatively flat curves. Ranges from .8 to 1.0. 


K2 - Highly uncorrelated curves. 

ҚЗ - Also highly uncorrelated curves, however, more lart 
кпап K2: 

x4 - MC and D are similarly flat, the rest seem correlated 


with a valley to an elevated flat plateau. 
KS - There seems to be a peak, then a declining trend in 
most of these curves. Again MC and D don't fit this 


observation and are generally flat. 


K6 - There are several peaks, then relatively fiat curves. 
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"SO WHAT" 


K1 - Similarly flat patterns. 
K2 - Highly uncorrelated with no 
K3 - There is a prominent valley 


except A. 


K4 - Highly uncorrelated with no 
KS - Highly uncorrelated with no 
K6 - Highly uncorrelated with no 
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recognizable patterns. 


in all of the observations 


recognizable patterns. 
recognizable patterns. 


recognizable patterns. 


APPENDIX N - FAST FOURIER TRANSFORM PROGRAM 
This program determines if there are any discrete 


frequencies existing within the reflection coefficient 


patterns. 
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APENDIX_O - FREQUENCY CONTENT OF K«N) 
This is an example of the output from the FFT program to 


determine if there are any discrete frequencies present in 


the reflection coefficient patterns. 2 
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Figure 0.1. Reflection Coefficient K6 for 
Utterence ^Ready-MC'. 
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Figure 0.2. 


Reflection Coefficient K3 for 
Utterence ’Ready-MC’. 


22 


50.0 


07004 


Figüre 0.3. 


охх O° 00E o`coí 879 
ЗСГЦ [NSA 


Reflection Coefficient Ка for 
ctterence ’Sneeze-NC’. 


93 


$0.0 


FROQUENCY (H2) 


9.6 


FFT «4 


0* 00$ 


0700» 


Figure O.4. 


Reflection Coefficient K1 for 
Utterence ^Sneeze-MC'. 
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Figure 0.6. Reflection Coefficient K4 for 
Utterence ^So Uhat-MC'. 
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